Using Growing Self-Organising Maps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing
نویسندگان
چکیده
Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suitable training features, implementing a better clustering algorithm, and defining quantitative measures for assessing results. We investigated the suitability of each of di-, tri-, tetra-, and pentanucleotide frequencies. The results show that dinucleotide frequency is not a sufficiently strong signature for binning 10 kb long DNA sequences, compared to the other three. Furthermore, we observed that increased order of oligonucleotide frequency may deteriorate the assignment result in some cases, which indicates the possible existence of optimal species-specific oligonucleotide frequency. We replaced SOM with growing self-organising map (GSOM) where comparable results are obtained while gaining 7%-15% speed improvement.
منابع مشابه
Methodology Report Using Growing Self-OrganisingMaps to Improve the Binning Process in Environmental Whole-Genome Shotgun Sequencing
Metagenomic projects using whole-genome shotgun (WGS) sequencing produces many unassembled DNA sequences and small contigs. The step of clustering these sequences, based on biological and molecular features, is called binning. A reported strategy for binning that combines oligonucleotide frequency and self-organising maps (SOM) shows high potential. We improve this strategy by identifying suita...
متن کاملBioinformatics strategies for taxonomy independent binning and visualization of sequences in shotgun metagenomics
One of main steps in a study of microbial communities is resolving their composition, diversity and function. In the past, these issues were mostly addressed by the use of amplicon sequencing of a target gene because of reasonable price and easier computational postprocessing of the bioinformatic data. With the advancement of sequencing techniques, the main focus shifted to the whole metagenome...
متن کاملClassification of metagenomic sequences: methods and challenges
Characterizing the taxonomic diversity of microbial communities is one of the primary objectives of metagenomic studies. Taxonomic analysis of microbial communities, a process referred to as binning, is challenging for the following reasons. Primarily, query sequences originating from the genomes of most microbes in an environmental sample lack taxonomically related sequences in existing refere...
متن کاملDraft Genome Sequences of Two Benthic Cyanobacteria, Oscillatoriales USR 001 and Nostoc sp. MBR 210, Isolated from Tropical Freshwater Lakes
Genomes of two filamentous benthic cyanobacteria were obtained from cocultures obtained from two freshwater lakes. The cultures were obtained by first growing cyanobacterial trichome on solid medium, followed by subculturing in freshwater media. Subsequent shotgun sequencing, de novo assembly, and genomic binning yielded almost complete genomes of Oscillatoriales USR 001 and Nostoc sp. MBR 210.
متن کاملA whole-genome shotgun optical map of Yersinia pestis strain KIM.
Yersinia pestis is the causative agent of the bubonic, septicemic, and pneumonic plagues (also known as black death) and has been responsible for recurrent devastating pandemics throughout history. To further understand this virulent bacterium and to accelerate an ongoing sequencing project, two whole-genome restriction maps (XhoI and PvuII) of Y. pestis strain KIM were constructed using shotgu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Biomedicine and Biotechnology
دوره 2008 شماره
صفحات -
تاریخ انتشار 2008